Team, Visitors, External Collaborators
Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Vocabularies, Semantic Web and Linked Data based Knowledge Representation

Modeling a Vocabulary of Professional Skills and Fields of Activities

Participants : Molka Dhouib, Catherine Faron Zucker, Andrea Tettamanzi.

In the framework of the collaborative project with Silex France company aiming to model the social network of service providers and companies, as a preliminary step, we developed a dedicated vocabulary of competences and fields of activities to semantically annotate B2B service offers. We started with the study of existing reference taxonomies representing skills, professions and fields of activities and we formalized them in SKOS. Then we built a SKOS vocabulary from the internal Silex repositories. Finally we performed a semi-automatic alignment of these vocabularies. This work has been presented at the French conference on Knowledge Engineering IC 2018 [53].

Representing and Querying a Knowledge Graph on Pedagogical Resources

Participants : Géraud Fokou Pelap, Catherine Faron Zucker, Fabien Gandon, Olivier Corby.

In the framework of the EduMICS (Educative Models Interactions Communities with Semantics) joint laboratory (LabCom) between the Wimmics team and the Educlever company, we built a knowledge graph from the database of the Educlever platform describing learning resources, and related knowledge and skills. We deployed our proposed Semantic Web based solution within the industrial environment of Educlever, using Web services, and we showed the added value of Semantic Web modelling enabling to implement new functionalities with SPARQL queries on the knowledge graph. This work has been presented at the SemWeb.Pro 2018 day [56] and at the WEBIST conference [34].

A Learnable Crawler for Linked Open Data

Participants : Hai Huang, Fabien Gandon.

This work is supported by the ANSWER project in cooperation with Qwant company. It consists of designing a learnable Linked Data crawler featured by a prediction component which is able to predict whether a newly discovered URI contains RDF data or not.

As the Web of Linked Open Data is growing exponentially, crawling for Linked Data has become increasingly important. Unlike normal Web crawlers, a Linked Data crawler performs selectively to collect linked RDF (including RDFa) data on the Web. From the perspectives of throughput and coverage, given a newly discovered URI, the key issue of Linked Data crawlers is to decide whether this URI is desirable to download (if it contains RDF data). Current solutions adopt heuristic rules aiming to filter irrelevant URIs. Unfortunately, it would hurt the coverage of crawling. In this work, we developed a learnable Linked Data crawler featured by a prediction component which is able to predict whether a newly discovered URI contains RDF data or not. We extracted useful features from the context RDF graph of the URI. The prediction model is based on FTRL-proximal (FTRL: Follow The Regularized Leader) online learning algorithm. We evaluated it through extensive experiments in comparison with a number of baseline methods and demonstrated its efficiency.

Argument Mining on Clinical Trials

Participants : Tobias Mayer, Serena Villata, Elena Cabrio.

This work was done in the context of the PhD of Tobias Mayer, which is situated in the IADB project, ”Intégration et Apprentissage sur les Données Biomédicales”. We created a new annotated dataset of Randomized Controlled Tirals (RCT) about four different diseases (glaucoma, diabetes, hepatitis B, and hypertension), containing 976 argument components (697 containing evidence, 279 claims) together with a first approach for the argumentative component detection [39]. Empirical results are promising and show the portability of the proposed approach over different branches of medicine. Furthermore, we proposed a new sub-task of the argument component identification task: evidence type classification, which distinguishes the provided evidence on a more fine-grained level. To address it, we proposed a supervised approach and we tested it on our data set [40].

As a collaboration with ”Base, Corpus, Language” (BCL) at UCA within the IADB project, we anonymized and cleaned clinical reports (from CHU Nice), built a ”raw” French corpus from it and are currently working on transferring the above mentioned annotations and models to this data set.

Structure Detection in Song Lyrics

Participants : Michael Fell, Elena Cabrio, Fabien Gandon.

In the context of the WASABI ANR project, we work on the estimation of the structure of song lyrics. For this, we have built a predictive model that successfully segments song texts into their underlying paragraphs - a task called ”Lyrics Segmentation”. We have augmented existing state-of-the-art models for Lyrics Segmentation in two ways: (i) by applying convolutional neural networks to the task alongside of novel feature representations. This work resulted in a publication at the COLING conference [33]; (ii) by extending the feature representation with time-synchronized audio features, we improve the segmentation model performance. It can now also use audio cues when text cues are non-indicative; this improves segmentation performance. Our current endeavors aim at summarizing song texts so that journalists and musicologists can perform efficient searches under different perspectives (e.g. structure and semantic content).

Legal Information, Privacy

Participants : Elena Cabrio, Serena Villata.

Together with Valentina Leone and Luigi di Caro (University of Torino), we presented the InvestigatiOnt tool which aims to ease the interaction of end users with legal ontologies in order to spread the use of machine-processable legal information as well as its understanding. This research is addressed in the context of the EU H2020 MIREL project. The results of this research have been published as demo paper at ISWC [71].

Together with Sabrina Kirrane (Vienna University of Economics and Business) and Matthieu d'Aquin (National University of Ireland Galway), we examined 78 articles from dedicated venues, the Privacy Online workshop series, two SPOT workshops, as well as the broader literature that connects the Semantic Web research domain with issues relating to privacy, security and/or policies. Specifically, we classified each paper according to three taxonomies (one for each of the aforementioned areas), in order to identify common trends and research gaps. We concluded by summarising the strong focus on relevant topics in Semantic Web research (e.g. information collection, information processing, policies and access control), and by highlighting the need to further explore under-represented topics (e.g., malware detection, fraud detection, and supporting policy validation by data consumers). The results of this research have been published in the Semantic Web journal [16].

Semantic Web for Biodiversity

Participants : Franck Michel, Catherine Faron Zucker.

The collaboration initiated with the French National Museum of Natural History of Paris (MNHN) is now giving rise to the development of an activity related to biodiversity data sharing and integration.

The TAXREF-LD linked data dataset, that we produced jointly with the MNHN, now appears in the Linked Open Data cloud (http://lod-cloud.net/) and is published on AgroPortal (http://agroportal.lirmm.fr/ontologies/TAXREF-LD/), the ontology Web portal for agronomy and agriculture. At the Biodiversity Information Standards conference (TDWG 2018), we presented some insights in the modelling of biodiversity Linked Data [45], we demonstrated how SPARQL Micro-Services can help in the integration of heterogeneous biodiversity-related data sources [43]. We also presented a poster on the Bioschemas.org initiative [46], a W3C community group that seeks the definition and adoption of common biology-related markup. In this context, we have proposed a first specification of the Taxon term (http://bioschemas.org/devSpecs/Taxon/) whose adoption as part of the official Schema.org vocabulary is currently being discussed with Google.

We took part in the D2KAB ANR project submission that aims to turn agronomy and biodiversity data into semantically described, interoperable, actionable open-knowledge. The project has been accepted and is due to start in June 2019.

Integration of Heterogeneous Data Sources

Participants : Franck Michel, Catherine Faron Zucker, Fabien Gandon.

With the incentive of fostering the integration of Linked Data and non RDF data sources, we published two contributions this year, together with Johan Montagnat from I3S.

First, we proposed a generic method to bridge the gap between the Semantic Web and NoSQL worlds [42]. To avoid defining yet another SPARQL translation method for each and every database, a SPARQL query is translated into a pivot abstract query, spanning all database-independent steps. Only then, the abstract query is translated into the target database query language while taking into account the specific database capabilities and constraints.

Second, we defined the SPARQL Micro-Service architecture that harnesses the Semantic Web standards to enable automatic combination of Linked Data and data residing in Web APIs (aka. REST Web services). A SPARQL micro-service is a lightweight, task-specific SPARQL endpoint that provides access to a small, resource-centric virtual graph, while dynamically assigning dereferenceable URIs to Web API resources that do not have URIs beforehand. The graph is delineated by the Web API service being wrapped, the arguments passed to this service, and the restricted types of RDF triples that this SPARQL micro-service is designed to spawn.

This work was presented at the ESWC conference [42] and the LDOW workshop at the Web Conference [44]. We also conducted an experimentation where we dynamically augment biodiversity Linked Data with data from multiple Web APIs: Flickr, Biodiversity Heritage Library, Encyclopedia of Life, Macauley scientific media archive, and MusicBrainz [43].

Linked Data Script Language

Participant : Olivier Corby.

We have designed and implemented LDScript, a programming language compatible with SPARQL that enables users to write extension functions that are directly executable in SPARQL queries.

We have leveraged pattern matching for structured objects such as lists where we can retrieve first elements, intermediate sublist and last elements. We have defined event driven processing where the SPARQL interpreter emits events which are processed by LDScript functions. The function definitions are annotated with event names. This enables users to trace query execution, to overload SPARQL statements such as ”order by, distinct” and to extend SPARQL with new statements implemented as functions. In particular we are able to overload SPARQL operators for extension datatypes such as romain numbers or values with units. We are also able to trap and overload SPARQL execution errors with specific LDScript functions. In addition, we have introduced a second order ”eval” function that enables us to evaluate the arguments of expressions that caused an error.

LDScript has been extended in order to process SPARQL Update in addition to SPARQL Query. Hence LDScript can be used to implement Semantic Web services with the following statements: SPARQL Query and Update, OWL RL entailment, RDF transformation to HTML.

This a follow up work on the formalism that was originally published at ISWC 2017 [72].

Graphic Display for RDF Graphs

Participants : Olivier Corby, Erwan Demairy.

This work has been done in the context of an Inria funding for software development (ADT).

In order to perform Linked Data visualisation, we connected the D3.js graphic display library to the Corese Web server. We designed an STTL transformation that generates D3 graph format with stylesheet from RDF graph. The graph display is performed thanks to SVG code generated by D3. The graph display can be interactive, that is hypertext navigation can be associated with a click on graph nodes. We have setup a demo with HAL open data server (http://corese.inria.fr), see figure 1.

Figure 1. Inria Sophia Antipolis teams publishing with south American countries.
IMG/sam.png

This work has been presented at the software development day at Inria Sophia Antipolis, November 14th.

Federated Query Scaler

Participant : Olivier Corby.

This work is done in the context of the Federated Query Scaler Inria exploratory research project (PRE) together with Olivier Dameron and Vijay Ingalalli from Dyliss team at Inria Rennes.

In this project, focused on SPARQL federated queries, Vijay Ingalalli designed a graph index for distributed SPARQL endpoints that enables us to predict whether joins between patterns can be performed within endpoints. We also wrote a compiler that generates a SPARQL query with service clauses from a federated query, that is a query annotated with several SPARQL endpoints URL.

We welcomed Vijay Ingalalli at Inria Sophia Antipolis, January 15-19, and Olivier Corby visited the Dyliss team in Rennes, March 4-6.